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PREFACE 

DATA  for  this  report  are  taken  in  part  Civil  Aeronautics  Administration, 
from  research  conducted  at  The  Each  of  the  authors  is  indebted  to  Dr. 
Ohio  State  University  under  the  auspices  Floyd  C.  Dockeray  of  The  Ohio  State 
of  the  National  Research  Council  Com-  University  for  his  valuable  counsel  and 
mittee  on  Selection  and  Training  of  Air-  guidance  in  the  designing  and  conduct- 
craft  Pilots,  with  funds  provided  by  the  ing  of  the  experiments. 
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FOREWORD 


UNTIL  comparatively  recently  there 
has  been  little  agreement  as  to 
what  constitutes  a  successful  airplane 
pilot.  Likewise,  although  there  have  been 
many  attempts  to  devise  methods  for  the 
selection  of  individuals  who  if  given 
proper  training  would  be  able  to  pilot 
airplanes  with  some  degree  of  com- 
petence, the  field  has  been  only  partially 
explored. 

Psychological  research  in  pilot  selec- 
tion began  during  the  first  world  war.  An 
excellent  review  of  the  type  of  work  done 
during  that  period  is  given  by  Dockeray 
and  Isaacs  (3).  Much  of  the  early  work 
was  of  doubtful  direct  value  because  of 
lack  of  opportunity  for  validation  in 
actual  flight  training  circumstances,  and, 
in  some  cases,  imperfect  design,  lack  of 
adequate  data  and  improper  statistical 
treatment  (18).  It  was,  however,  a  definite 
contribution  and  pointed  out,  in  many 
cases,  the  direction  for  later  research. 

After  the  cessation  of  hostilities  inter- 
est waned,  and  by  1939  Jenkins  (6)  re- 
ports that  no  psychologist  was  working 
in  the  field  of  aviation  psychology.  Re- 
search in  the  field  was  not  revived  until 
the  present  conflict  became  imminent. 
However,  in  the  interim  between  the  two 
wars,  laboratory  work  in  the  field  of  psy- 
chology was  done,  some  of  which,  when 
the  occasion  arose,  proved  to  be  of  value 
in  the  selection  of  pilots,  even  though 
originally  it  had  not  been  undertaken 
with  this  in  view.  However,  as  had  been 
the  case  with  a  large  part  of  the  early 
pilot  selection  research,  the  value  of 
much  of  this  was  not  immediately  known 
because  of  a  lack  of  opportunity  to  vali- 
date the  findings  in  actual  flight  training 
circumstances. 

It  is  recognized  that  the  research  done 


in  recent  years  on  analysis  of  piloting  has 
resulted  in  a  large  amount  of  pertinent 
information.  Likewise,  research  per- 
formed on  the  isolation  of  the  personal 
characteristics  which  are  involved  in  fly- 
ing has  yielded  acceptable  evidence  as  to 
their  importance.  Much  of  this  informa- 
tion, however,  has  been  related  to  mili- 
tary aviation  as,  for  example,  the  work 
of  Carlson  (1)  and  Delucchi  (2)  where 
interest  has  centered  in  the  selection  of 
military  pilots.  With  the  exception  of 
the  research  projects  which  have  been 
done  under  the  auspices  of  the  National 
Research  Council,  Committee  on  Selec- 
tion and  Training  of  Aircraft  Pilots  (25), 
few  investigations  of  a  psychological  na- 
ture have  been  concerned  with  the  prob- 
lems of  the  civilian  aviator.  In  view  of 
the  widespread  interest  in  private  flying, 
and  with  tlie  current  liberal  selection 
standards  for  candidates  for  the  private 
pilot's  license,  it  seems  more  than  ever 
advisable  to  examine  the  selection  pro- 
cedures which  are  available.  That  this  is 
a  task  for  the  psychologist  has  been  ex- 
pressed well  by  Kellum  (n)  who  says 
that  the  real  problem  of  the  selection  of 
aviators  begins  after  the  physical  exami- 
nation has  been  given.  He  points  out 
that  a  large  percentage  of  candidates  who 
pass  the  physical  examination  eventually 
fail  in  flight  training,  and  the  reasons 
for  failure  are  non-physical. 

One  of  America's  foremost  investiga- 
tors in  the  field  of  aviation  selection  dur- 
ing the  recent  war,  Liljencrantz  (14),  de- 
fined pilot  selection  as  follows: 

The  process  of  selection  may  be  conceived 
as  consisting  of  the  administration  of  a  test 
or  a  single  group  or  battery  of  tests,  on  the 
basis  of  which  a  dependable  decision  can  be 
reached  as  to  an  applicant's  aptitude  for 
aviation. 
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FOREWORD 


The  abilities  required  in  successful 
piloting  are  best  described  as  a  complex 
of  coordinations,  skills,  and  abilities. 
Therefore,  adequate  pilot  selection 
•would  best  be  accomplished  through  a 
method  which  combined  various  meas- 
urements of  the  components  of  this  com- 
plex. Thus  far,  no  combination  of  selec- 
tion techniques  has  proved  to  be  com- 
pletely valid,  nor  has  any  test  battery  by 
itself  or  when  combined  with  other  selec- 
tion techniques  such  as  the  informal 
interview  or  application  blank,  reached 
a  point  wherein  its  validity  could  not  be 
improved.  The  question  might  be  asked 


as  to  whether  or  not  a  test  battery  can 
be  assembled  to  measure  the  various  fac- 
tors involved  in  learning  to  fly  and  if 
such  a  battery  might  be  used  to  predict 
the  success  of  candidates  in  learning  to 
fly  light  airplanes.  If  any  combination 
of  selection  tools  can  be  made  that  serves 
the  purpose  of  accurate  prediction  (of 
ultimate  success  or  failure),  this  com- 
bination should  prove  to  be  more  eco- 
nomical and  valuable  to  use  than  any 
one  of  the  component  predictors  by  it- 
self, or  than  the  selection  procedures 
already  in  use. 


STUDIES  IN  PILOT  SELECTION 
I.    Prediction  of  Success  in  Learning  to  Fly  Light  Aircraft 

G.  GORHAM  LANE 
The  Ohio  State  University 


statement  of  the  problem 

IT  WAS  on  the  basis  of  the  foregoing  dis- 
cussion that  the  present  research  was 
undertaken.  A  study  was  planned,  first  of 
all,  which  would  make  it  possible  to 
analyze  some  of  the  factors  involved  in 
the  determination  of  success  in  learning 
to  fly  light  aircraft.  Various  tests  were 
available  which  had  already  been  shown 
to  measure  factors  involved  in  learning 
to  fly,  and  a  new  test  was  added.  This  test 
was  designed  to  measure  factors  not  con- 
sidered by  any  of  the  other  tests.  It  was 
believed  that  these  tests  could  be  assem- 
bled into  a  battery  not  only  which  would 
measure  some  of  the  factors  involved  in 
learning  to  fly,  but  which  would  also  be 
of  some  use  in  predicting  success  or  fail- 
ure in  flight  training.  It  was  also  believed 
that  it  would  be  possible  to  show  how 
much  each  of  the  factors  measured  by  the 
tests  contributed  to  the  prediction  of 
success  or  failure  in  learning  to  fly. 
Finally,  it  was  believed  that  the  predic- 
tive value  of  the  test  battery  might  best 
be  examined  if  several  types  of  criteria 
were  used.  It  seemed  possible  that  a  test 
battery  might  have  high  predictive  value 
for  a  criterion  of  success  in  specific 
maneuvers,  but  be  worthless  in  the  pre- 
diction of  over-all  success  or  failure. 

subjects  used  in  the  investigation 

Thirty-seven  male  subjects  between  the 
ages  of  17  and  29  years  were  used  in  this 
investigation.  All  were  enrolled  in  the 
National  Research  Council  flight  train- 
ing program  at  the  Ohio  State  Univer- 
sity, and  received  their  flight  training  be- 


tween the  middle  of  October,  1945,  and 
the  first  of  March,  1946.  Twenty-nine 
men  were  either  enrolled  in  college,  or 
had  received  bachelor's  degrees  and  had 
research  positions  on  the  campus. 

selection  tests  used  in  the  present 
investigation 

All  applicants  for  flight  training  under 
the  experimental  program  were  given  a 
battery  of  tests.  These  tests  were  as 
follows: 

1.  The  Self- Administering  Test  of  Mental 
Ability 

(Gamma  A.  M.  Otis  quick-scoring) 

2.  The  Ohio  State  Psychological  Examina- 
tion 

(Form  22) 

3.  Test  of  Aviation  Information  (Form  P) 

4.  Biographical      Inventory      (Form      2C 

Civilian  Key) 

5.  Test    of     Mechanical     Comprehension 

(Form  B,  CAA) 

6.  Desire  to  Fly  (Form  XPA) 

7.  Mashburn  Serial  Reaction  Test 

8.  Two-Hand  Coordination  Test 

9.  Judgment-Reaction  Test 

In  the  selection  of  tests  for  inclusion  in 
this  predictor  battery,  an  attempt  was 
made  to  include  tests  which  were  easy 
and  economical  to  administer,  closely  re- 
lated to  aviation  and  which  produced 
results  capable  of  being  interpreted  in  a 
straight-forward  manner.  In  addition,  the 
following  considerations  were  utilized. 

The  role  of  general  intelligence  in 
flight  success  has  been  widely  recognized 
(17),  and  the  Test  of  Mental  Ability  of 
the  C..A.A.  fulfills  the  requirements  of 
being  a  reliable  instrument  which  can 
be  used  economically  from  the  point  of 
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view  of  both  time  and  money.  A  test- 
retcst  measure  reported  for  applicants  for 
primary  and  secondary  war  service  train- 
ing under  the  C.A.A.  is  .79  (21).  In  ad- 
ministering the  test,  a  twenty-minute 
time  limit  was  used,  and  raw  scores  were 
utilized  in  statistical  computation. 

Since  one  study  at  least  (10)  has  indi- 
cated that  there  is  some  relationship  be- 
t^\•een  giades  in  college  and  success  in 
flight  training,  it  was  decided  to  include 
some  such  measure  in  the  cinrent  bat- 
tery. The  N.R.C.  Flight  Training  Course 
at  the  Ohio  State  University  was  open  to 
non-students  as  well  as  to  students;  there- 
fore, scholastic  grades  were  not  available 
for  all  individuals  taking  part  in  the 
study.  However,  the  Ohio  State  Psycho- 
logical Examination  was  available,  and 
tliis  test  has  been  shown  to  have  a  cor- 
relation of  .606  with  first  semester  scho- 
lastic grades  for  male  students  (24).  Ac- 
cording to  a  verbal  report  made  by  Dr. 
Herbert  Toops,  author  of  this  test,  the 
reliability  coefficients  which  have  been 
calculated  for  this  test  are  approximately 
.94.  Grades  are  expressed  in  percentiles, 
based  upon  current  norms  at  the  Ohio 
State  University.  Percentiles  were  used  in 
the  statistical  portion  of  this  study. 

The  Test  of  Aviation  Information  (AI) 
was  developed  in  1941-43  at  Wesleyan 
University  and  the  University  of  Roches- 
ter, under  the  sponsorship  of  the  Com- 
mittee on  Selection  and  Training  of  Air- 
craft pilots,  NRC.  Preliminary  work  done 
on  this  test  indicates  that  "it  can  be  in- 
cluded as  one  of  the  more  promising  pre- 
dictors developed  in  the  Committee's  re- 
search program"  (25).  The  test  contains 
a  total  of  200  questions  concerned  with 
aviation.  The  raw  score  was  used  in  all 
statistical  computations.  Reliability  co- 
efficients of  .751  and  .771  have  been  re- 
ported for  this  test  (21). 


The  Biographical  Inventory  is  other- 
wise known  as  the  Inventory  of  Personal 
Data  for  Prospective  Pilots  (27).  Regard- 
ing this  inventory,  Viteles  has  said  (25): 

The  Biographical  Inventory  represents  one 
of  the  first,  if  not  the  first,  successful  attempt 
to  predict  pilot  proficiency  from  biographical 
data. 

Also,  it  should  be  remembered  that 
biographical  data  have  often  been  used 
in  informal  interviews.  Psychiatric 
examination  of  candidates  for  flight 
training  often  includes  questions  dealing 
with  the  individual's  past  history.  John- 
son (8)  has  pointed  out  that  biographical 
data,  if  properly  used,  are  extremely  valu- 
able in  the  selection  of  individuals  for 
aeronautical  training. 

A  newly  created  Civilian  key  (The 
Kelly  Positive  Key)  was  used  in  scoring 
this  inventory,  and  raw  scores  were  used 
in  the  computation  of  statistical  results. 
Reliability  coefficients  of  .525  and  .603 
have  been  reported  previously  for  this 
inventory  (21). 

Since,  in  any  learning  situation  the 
factor  of  motivation  is  most  important, 
it  seemed  desirable  to  include  some  meas- 
ure of  the  student's  interest  in  learning 
to  fly.  The  Desire  to  Fly  Inventory  (12) 
was  developed  at  the  University  of 
Rochester,  and  preliminary  work  on  it 
has  indicated  that  it  has  practical  useful- 
ness in  a  battery  of  predictors  for  success 
in  flight  training.  It  contains  a  total  of 
235  questions  pertinent  to  interest  in  fly- 
ing which  are  to  be  answered  by  the 
applicant.  Key  A  B  was  used  in  scoring, 
and  raw  scores  were  utilized  in  the  cur- 
rent investigation. 

Although  no  reliability  coefficients  are 
available  for  the  Desire  to  Fly  Inventory, 
the  authors  indicate  that  it  is  reliable  in 
their  report  to  the  National  Research 
Council  (12).  In  this  report  they  state: 
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An  analysis  of  the  distributions  of  items 
answered  "No"  by  various  percentages  of  the 
populations  of  both  samples  A  and  B  indi- 
cated that  the  items  are  fairly  stable  in  the 
sense  that  each  was  answered  "No"  by  ap- 
proximately the  same  proportion  of  cases  in 
each  sample. 

The  Mashburn  Serial  Reaction  Ap- 
paratus is  one  of  the  older  psychomotor 
tests  still  being  used  in  pilot  selection. 
Actually  the  test  measures  the  individ- 
ual's ability  to  make  rapid  eye-hand-foot 
reactions.  A  complete  description  of  this 
apparatus  is  available  in  McFarland's  re- 
port No.  34  for  the  Airman  Development 
Division  of  the  C.A.A.  (15).  The  total 
time  required  for  making  a  series  of  forty 
eye-hand-foot  responses  was  obtained  and 
used  in  later  statistical  work.  In  the 
Boston-Midwest  Study  (20)  reliability 
coefficients  of  .53,  .74,  and  .74  were  found 
for  this  test  when  three  different  samples 
were  used. 

The  Two-Hand  Coordination  Test  was 
developed  from  two  tests  formerly  used 
in  industrial  selection,  namely,  the  Wis- 
consin Miniature  Engine-Lathe  Test,  and 
the  Farmer-Chamber's  Coordination 
Test.  A  complete  description  of  the  cur- 
rent version  of  the  test  is  available  (16). 
On  this  test  the  subject  is  given  six  trials 
and  is  scored  on  the  percentage  of  time 
he  maintains  contact  between  two  mov- 
ing discs.  The  preliminary  studies  on  this 
test  indicate  that  the  mean  of  six  trials 
was  found  to  correlate  .78,  .87,  .80  with 
trials  4,  5,  and  6.  Therefore,  the  mean 
score  for  each  applicant  was  used  in  this 
study.  Reliability  data  are  available  from 
the  Boston  Midwest  Study  for  three  sam- 
ples and  were  found  to  be  .75,  .50,  and 
.80  (20). 

The  Test  of  Mechanical  Comprehen- 
sion (MC)  contains  seventy-six  questions 
with  companion  diagrams  concerning 
mechanical     problems.     Reliabilitv     co- 


efficients of  .697,  and  .743  have  been  re- 
ported (21). 

In  view  of  the  fact  that,  in  the  past, 
tests  involving  reaction  time  have  shown 
relationships  greater  than  chance  to  fly- 
ing success,  it  seems  that  any  battery  of 
tests  constructed  for  the  purpose  of  pre- 
dicting success  in  flight  training  should 
be  heavily  weighted  with  tests  involving 
reaction  time.  Therefore,  in  the  present 
investigation,  a  new  test  was  included, 
which,  although  involving  a  measure- 
ment of  reaction  time,  was  improved 
from  the  points  of  view  of  economy  of 
construction,  and  of  simplicity  of  admin- 
istration and  scoring.  This  has  been 
designated  as  the  Judgment-Reaction 
Test. 

The  Judgment-Reaction  Test  was 
based  upon  an  apparatus  originally  de- 
signed by  Ranschburg  and  called  by  him 
a  "Mnemometer."  For  purposes  of  the 
present  experiment  the  basic  apparatus 
was  modified  and  improved.  As  used  in 
this  research,  the  test  consists  of  a  black 
wooden  box  approximately  15  inches 
square  in  which  the  revised  Ranschburg 
rotation  mechanism  was  placed.  The 
front  surface  of  the  box  contains  an 
aperture  through  which  the  stimulus  is 
visible  and  two  hand  switches  which  are 
operated  by  the  subject. 

Stimuli  consist  of  sixty  shades  of  Her- 
ing  grays  so  arranged  on  a  cardboard  disc 
that  as  the  correct  switch  is  pressed  by 
the  subject,  the  disc  moves  one-sixtieth 
of  its  circumference  and  reveals  a  new 
stimulus  through  the  aperture.  The  sub- 
ject is  required  to  make  a  judgment  as  to 
whether  the  shade  of  gray  which  has  be- 
come visible  is  darker  or  lighter  than  the 
one  which  was  visible  previously:  If  it  is 
darker,  he  must  press  the  switch  on  his 
right;  if  it  is  lighter,  he  must  press  the 
switch  on  his  left.  If  he  makes  an  incor- 
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rect    response,    the    stimulus    does    not  student's  performance  from  day  to  day, 

change.  Only  the  preferred  hand  is  used,  so  it  seemed  advisable  that  any  data  col- 

The   subject    is    required    to    make    a  lected    on    his    performance    should    be 

series  of  sixty  judgment  reactions  and  is  sampled  over  a  period  of  time  in  order 

scored  on  the  quickness  with  which  he  to  ol)tain   as  accurate  a   picture   of  his 

completes  the  series.  He  is  given  a  series  flight  performance  as  possible, 

of  six  trials  of  sixty  judgments  each  with  Furthermore,  there  is  the  ever  present 

a  rest  pause  of  one  minute  between  trials,  difficulty  of  unreliability  of  criteria  ob- 

In   preliminary  experiments   done  with  tained  from  ratings.  One  way  by  which 

this  test,  a  correlation  of  .90  was  found  it  is  possible  to  minimize  the  error  due  to 

between  trials  4  and  5.  This  was  taken  individual  ratings  is  to  combine  ratings 

as  the  point  at  which  the  effects  of  initial  or  to  secure  ratings  from  different  ob- 

practice  were  at  a  minimum  and  also  as  servers  whenever  possible, 

an    indication    of    test-retest    reliability.  Another  consideration  in  the  selection 

The    time   required    to    complete    sixty  of  criteria,  especially  in  a  study  such  as 

judgment  reactions  in  the  fifth  trial  was  this  one  involving  flight  performance,  is 

used  as  the  score  for  the  subject  in  this  the  practicality  of  collecting  observations, 

investigation.  This  factor  necessarily  limited  the  types 

of  observations  which  could  be  collected. 

CRITERIA  USED  IN  THE  PRESENT  Keeping  in  mind  these  principles,  cri- 

INVESTIGATION  terion    data   were    obtained    from    four 

The  criteria  of  success  in  flight  train-  sources: 
ing  available  in  this  study  were  divided  j.  The  C.A.A.  Flight  Inspector 
into  two  main   types:   gross  criteria,   or  2.  The  student's  instructor 
measures  of  over-all  success  or  failure  in  3-  The  check  flight  pilot 
flight  training;  and  specific  criteria,  such  4-  The  student's  log  book 
as   observations   of   good   or   poor   per-  Two  measures  were  available  from  the 
fomiance  on  specific  aspects  of  flight  per-  C.A.A.  Flight  Inspector.  These  were  the 
formance.   These   are   not  mutually   ex-  overall  grades  on  the  private  pilot  test, 
elusive  for  it  is  entirely  possible  that  a  and  the  demerit  score  given  on  the  stu- 
student  might  perform  above  average  on  dent's  landing  performance, 
most  maneuvers,  yet  not  be  able  to  com-  The  C.A.A.  grade  on  the  private  pilot 
bine  their  performance  into  a  smooth  pat-  test  represents  the  inspector's  opinion  of 
tern  and,  thus,  be  rated  as  a  "poor"  flier,  the  student's  skill  in  piloting.  It  deter- 
Still  again,  a  student  might  be  able  to  mines,  in  part,  whether  or  not  the  stu- 
perform  maneuvers  and  specific  aspects  dent  is  issued  the  private  pilot's  license, 
of  flight  well,  yet  be  labeled  an  "unsafe"  Normally,   this  test,   which  comprises  a 
pilot.  On  the  other  hand,  a  student  might  series  of  maneuvers  prescribed  by  the  Civ- 
successfully  pass  his  private  flight  exami-  il  Aeronautics  Administration,  does  not 
nation   by  giving  an   overall   good  per-  take  place  until  the  student's  instructor 
formance,    although    inspection    of    his  has  recommended  him  as  being  ready  for 
individual  grades  might  reveal  that  he  the  private  pilot's  license.  However,  in 
was  better  in  some  aspects  of  flight  per-  the  experimental  program,  all  students 
formance  than  in  others.  received   this   test   after    they  had   com- 

There  is  also  a  large  variation  in  the  pleted  thirty-five  hours  of  flight  training. 
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Previous  research  has  shown  that  the 
inspector's  grade  on  this  test  has  a  sub- 
stantial relationship  to  some  of  the  pre- 
dictors which  were  used  in  the  current 
research  (lo). 

It  was  mentioned  previously  that  in 
this  study  the  attempt  was  made  to  sam- 
ple criteria  of  two  types,  gross  and  spe- 
cific. Since  landing  is  one  of  the  most 
complex  and  difficult  of  all  maneuvers 
encountered  in  learning  to  fly,  it  was 
selected  to  represent  the  specific  criteria. 
The  following  quotation  from  an  Army 
Air  Forces  article  expresses  well  the 
reason  for  stressing  the  student's  ability 
to  land  a  plane  (23). 

Experts  are  of  the  definite  opinion  that 
landing  in  the  proper  place  in  the  proper 
attitude  without  dropping  the  plane  in  or 
bouncing  it  involves  important  aspects  of 
flying  skill,  namely,  the  ability  to  judge  space 
and  plan  a  course  through  it,  to  control  the 
attitude  and  airspeed  of  the  plane,  and  to 
feel  when  it  is  about  to  stall. 

One  requirement  of  the  private  pilot 
test  is  that  the  student  attempt  three  spot 
landings;  that  is,  attempt  three  times  to 
set  the  plane  on  the  ground  within  300 
feet  of  a  designated  spot  on  the  runway. 
Failure  to  do  this  in  two  out  of  three 
attempts  means  failure  in  the  entire 
examination. 

According  to  C.A.A.  regulations,  scores 
on  individual  maneuvers  such  as  this  are 
given  on  a  demerit  basis.  Demerit  scores 
range  from  one  through  five,  having 
^aIues  as  follows: 

1.  Excellent  (go- 100) 

2.  .\bove  average  (85-90) 

3.  Average    (80-85) 

4.  Below  Average   (70-80) 

5.  Unsatisfactory  (0-70)* 

Although  the  student  has  to  attempt 
three  landings,  only  one  demerit  score, 
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representing  the  average  score  on  his 
performance  is  given  for  landing.  This 
score  plus  the  overall  grade  on  the  pri- 
vate pilot  test  were  selected  to  represent 
the  C.A.A.  inspector's  opinion  of  the  stu- 
dent's flying  ability. 

Five  measures  were  available  from  the 
student's  flight  instructor  and  were  in- 
cluded in  the  initial  phases  of  this  study. 
The  first  four  of  these  measures  repre- 
sented the  instructor's  appraisal  of  the 
student's  skill  in  landing.  The  instructor 
kept  a  daily  check  sheet  on  the  student's 
performance  during  the  flight  lesson,  and 
each  maneuver  performed  was  graded  on 
a  percentage  basis,  with  70  being  re- 
garded as  passing.  In  order  to  obtain 
landing  data  on  each  student  from  the 
flight  instructor,  the  following  procedure 
was  devised. 

During  each  flight  lesson  students 
might  make  a  variable  number  of  land- 
ings. In  some  flight  lessons,  he  might 
make  none  at  all  without  assistance.  In 
order  to  obtain  an  adequate  sample  of 
the  instructor's  grades  on  unassisted  land- 
ings for  each  student,  an  average  grade 
was  computed  for  landings  made  by  the 
student  during  the  two  flights  immediate- 
ly preceding  each  of  the  four  check 
flights.  This  represented  the  most  satis- 
factory method  of  obtaining  instructor 
data  on  landings  and  these  data  are  com- 
parable on  a  time  basis  to  that  obtained 
from  the  check  pilots. 

Further  data  were  available  from  the 
instructor  in  the  form  of  ratings  on  a 
"Scale  for  Rating  Pilot  Competency." 
This  scale  was  developed  at  Purdue  Uni- 
versity (9)  and  a  factor  analysis  of  the 
scale  has  indicated  that  the  14  items  in 
the  scale  measure  three  distinct  factors, 
tentatively  called  "skill,"  "judgment," 
and  "emotional  control."  Preliminary 
work  on  the  scale  has  indicated  that  its 
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use  differentiates  between  the  "best"  and  merit  scores  refer  to  scores  based  upon 

"poorest"  students  of  a  large  group  of  individual  aspects  of  a  maneuver,  and 

instructors  from  several  different  areas,  grades  represent  an  over-all  estimate  of 

There  is  a  possibility  of  scoring  each  maneuver  performance, 
item  on  40  points.  Since  the  original  ex-  Each  check  flight  required  three  rat- 
perimentation  done  on  this  scale  indi-  ings  of  the  student's  performance  in  mak- 
cated  tliat  the  most  reliable  results  were  ing  a  "Final  Approach  and  Landing." 
obtained  by  adding  the  scores  on  the  Data  on  this  maneuver  were  divided  into 
three  factors,  and  combining  them  into  three  main  categories,  "Control  of 
three  separate  scores,  this  method  was  Plane,"  "Precision,"  and  "Safety."  Each 
followed  in  the  present  study.  The  total  of  these  was  broken  down  into  sub- 
number  of  points  on  each  factor  received  divisions  which  could  be  scored  separate- 
by  a  student  was  regarded  as  his  score.  ly.  In  this  way  it  was  possible  to  secure  a 

The  Ohio  State  Flight  Inventory  pro-  detailed    account   of   the   student's    per- 

vided   a   major   portion   of   the   criteria  formance    on    that    particular    landing, 

used  in  this  study  and  comprised  all  of  Scores  on  this  check  sheet  were  obtained 

the  data  received  from  the  check  pilots,  in  the  form  of  demerit  weights,  ranging 

The  OSFI,  as  it  is  commonly  called,  was  from  one  through  three.  If  a  particular 

the  result  of- research   initiated   at   the  performance   was   satisfactory,   no  score 

Ohio   State   University   in    1939,    in   an  was  given. 

attempt  to  devise  a  standardized  rating  For  the  present  purposes,  the  total  de- 
technique  for  use  in  making  observations  merit  score  for  each  landing  was  com- 
of  student  pilot  performance  (4).  The  puted.  Then  an  average  was  taken  of  the 
most  recent  version  of  the  OSFI  was  used  demerit  scores  on  as  many  landings  as 
(19).                                               '  the  student  made.  In  no  case  was  more 

This  inventory  was  administered  four  than  one  landing  omitted,  and  this  omis- 

times  to  each  student  during  the  course  sion   was   generally   due   to   incomplete 

of  flight  instruction.  It  was  used  during  scoring  because  the  check  pilot  had  to 

the  check  flights  which  occurred  at  the  take  over  the  controls.  Average  demerit 

end   of   seven,    fifteen,    twenty-five,   and  scores    were    available    on    four    check 

thirty-five    hours    of    flight    instruction,  flights  for  each  student  used  in  the  study. 

Two  check  pilots  were  used  to  administer  Each  landing  that  the  student  made 

this  inventory,  and  these  pilots  flew  with  during  the  check  flight  was  also  assigned 

each  student  on  alternate  check  flights,  an  over-all  grade  by  the  check  pilot.  This 

The  inventory  was  filled  out  during  the  grade  reflected  his  opinion  of  the  stu- 

flight  so  that  omissions  due  to  forgetful-  dent's   over-all   performance   in  landing 

ness  or  oversight  might  be  kept  at  a  mini-  the  plane.  In  all  check  flights  the  check 

mum.  The  check  pilots  flew  with  the  pilot  was  asked  to  grade  the  student  on 

student  only  during  the  check  flight,  and  a  percentage  basis,  using  70  as  the  fixed 

thus,  tills  opinion  of  the  student's  flight  grade  for  a  passing  score.  At  each  stage  in 

performance  represents  an  independent  his  training,  the  student  was  compared 

estimate.  with  the  performance  expected  of  a  certi- 

Two  types  of  data  were  available  from  fied  private  pilot;  in  other  words,  he  was 
the  OSFI.  These  are  referred  to  as  de-  graded  on  a  fixed  scale.  The  average  over- 
merit  scores  and  maneuver  grades.  De-  all  grade  for  each  landing  made  was  com- 
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puted  for  each  of  the  four  check  flights.  OSFI   summation   score   were   found   to 

In  addition  to  making  three  regular  range  from  .83  to  .95.  This  indicates  a 

landings  during  each  check  flight,   the  marked  consistency  between  the  ratings 

student  was  asked  to  execute  one  "Land-  secured  from  these  two  sources.  In  a  more 

ing  with  a  Slip."   Performance  on  this  recent  report,  Wapner  and  Bakan  (26) 

maneuver  was  first  scored  on  a  demerit  have  attempted  to  demonstrate  the  rela- 

basis  and  the  total  number  of  demerits  tionship    of    inspector's    grades    on    the 

was  computed  for  each  "Landing  with  a  OSFI  to  photogiaphic  records  of  the  stu- 

Slip,"  making  available  a  total  of  four  dent's    performance.    Their    conclusions 

sets  of  demerit  scores.  Over-all  grades  for  were  that  although,  in  general,  the  ac- 

this  maneuver  were  also  available  and  curacy  of  the  inspector's  rating  was  high, 

were  included.  there  was  some  variation  in  the  degree 

On  the  fourth  check  flight,  at  the  con-  of  accuracy  for  certain  maneuvers.  How- 
elusion  of  thirty-five  hours  of  flight  in-  ever,  they  feel  that  their  study  supplies 
struction,  the  student  w'as  required  to  empirical  evidence  to  justify  placing  con- 
proceed  to  a  strange  field  and  make  a  fidence  in  the  inspector's  criterion  meas- 
landing.  This  necessitated  a  somewhat  ures.  Actually,  this  shows  that  such  rat- 
different  approach  to  the  field  since  in-  ings  can  be  accurate.  And  thus  it  seems 
structions  were  to  make  a  "power  on  justifiable  to  use  raters  who  have  been 
landing,"  whereas  the  other  landings  well  trained  in  using  the  OSFI,  in  re- 
were  made  "power-off."  Demerit  scores  search  on  pilot  selection, 
for  this  maneuver  were  available  for  the  In  an  analysis  done  by  Johnson  and 
fourth  check  flight  only  and  were  in-  Boots  (7),  of  ratings  in  the  preliminary 
eluded  in  the  criteria.  phase  of  the  CAA  Training  Program,  it 

An  independent  gross  criterion  was  was  found  that  correlations  between  in- 
obtained  by  keeping  a  record  of  the  spector's  final  ratings  and  instructor's 
length  of  time  required  before  the  stu-  mean  ratings  on  given  maneuvers  were 
dent  was  allowed  to  solo.  Although  popu-  low,  even  for  the  last  two  hours  of  flight, 
larly,  such  time  measures  as  this  have  Furthermore,  the  intercorrelations  be- 
been  thought  to  be  of  significance,  pre-  tween  instructor's  mean  ratings  for  given 
vious  studies  have  shown  that  this  par-  maneuvers  ranged  from  .29  to  .93,  with 
ticular  measure  may  not  be  too  signifi-  the  higher  correlations  tending  to  be  be- 
cant  (25).  "Time  to  Solo"  records  were  tween  maneuvers  most  frequently  rated, 
available   for   all   students   in   terms   of 

hours  and  minutes  before  soloing,  and  Results  and  discussion  of  results 

were  included  as  criteria.  The  procedure  usually  adopted  when 

Although  the  reliability  of  the  criteria  it  is  desired  to  evaluate  the  relative  con- 
used  in  this  investigation  is  still  being  tribution  of  single  tests  to  the  predictive 
studied  in  other  investigations,  some  data  value  of  a  battery  of  tests  is  to  calculate 
are  available  which  indicate  that  these  regression  coefficients  for  the  tests  in- 
criteria  are  suitable  for  use  in  connection  volved.  The  method  which  is  regarded  as 
with  pilot  selection.  most    satisfactory    w^hen    working    with 

In  a  study  reported  by  the  National  many  variables  is  that  originally  devised 

Research  Council  (19),  correlations  be-  by  Doolittle.  A  complete  description  of 

tween  inspector's  flight  grades  and  the  this  method  is  available  in  Peters  and 
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Van  Voorhis  (22)  pp.  226-234.  This  tech- 
nique was  utilized  in  this  study. 

Each  test  was  selected  for  inclusion  in 
the  battery  because  it  was  believed  to 
measure  some  factor  or  factors  involved 
in  tlie  ability  to  fly  light  aircraft.  There- 
fore, some  measure  of  the  relationships 
between  the  tests  was  necessary,  and  in- 
tercorrelations  were  computed.  Table  I 
shows  the  zero  order  correlations  among 
all  of  the  tests  included  in  the  battery 


most  part  the  higher  correlations  were 
found  between  grades  and  scores  given 
by  the  same  pilot  on  the  same  maneuver, 
thus  giving  an  indication  of  the  reliabil- 
ity of  these  ratings.  Table  III,  below, 
shows  the  correlations  which  obtained 
between  demerit  scores  and  grades  on 
landings  when  both  measures  were  ob- 
tained from  the  same  instructor. 

All  variables  were  coded  in  such  a  way 
that    a    positive   correlation   indicates   a 


Table  I 
Intercorrelations  of  Predictors 


Judg- 

Mental 

Two- 

ment 

ability 

OSPE 

A.I. 

B.I. 

M.C. 

D.F. 

Mash. 

hand 

reaction 

Judgment  reaction 

•315 

•  353 

.289 

—  .005 

-.047 

.203 

.228 

.270 

Mental  ability 

•744 

.420 

-.017 

.282 

—  .IQO 

•376 

•354 

OSPE 

•439 

.062 

.285 

-.079 

•383 

•398 

A.I. 

•395 

•419 

-.208 

.168 

.276 

B.I. 

.  204 

.180 

.066 

-.013 

M.C. 

.087 

.411 

.526 

D.F. 

•145 

.067 

Mash. 

.625 

Two-hand 

of  predictors.  In  general,  these  correla- 
tion coefficients  were  not  high,  indicat- 
ing that  each  test  was  fairly  independent 
of  the  others.  The  major  exception  to 
this  was  the  correlation  of  .744  between 
the  O.S.P.E.  and  the  test  of  Mental  Abil- 
ity. This  might  have  been  expected,  how- 
ever, from  the  nature  of  the  tests.  Next 
to  this  was  the  correlation  of  .625  be- 
tween the  two  coordination  tests,  and 
this,  too,  might  have  been  expected  from 
the  nature  of  the  two  tests  involved. 

Table  II  shows  the  intercorrelations 
among  the  criteria.  As  in  the  case  of  the 
predictors,  the  intercorrelations  were,  in 
general,  low.  The  highest  correlation  was 
.725,  between  the  same  instructor's 
"scores"  and  "grades"  on  landings  with 
a  slip  on  the  first  check  flight.  Sixteen  of 
the  correlations  were  above  .50.  For  the 


positive  relationship  between  proficiency 
in  both  variables,  with  the  exception  of 
the  C.A.A.  landing  grade.  Negative  cor- 
relation in  this  case  indicates  a  positive 
relationship  in  proficiency. 

The  correlation  between  the  two  meas- 
ures from  the  C.A.A.  Inspector's  flight 
test  was  —.512.  A  relationship  such  as  this 
might  be  expected  as  the  landing  grade 
is  one  of  the  most  important  deter- 
minants of  the  over-all  grade.  Evidence 
that  a  positive  relationship  does  obtain 
between  these  two  grades  is  furnished  in 
a  study  done  on  the  analysis  of  Inspec- 
tor's ratings  by  Festinger  (5)  in  which  it 
was  found  that  among  the  most  uni- 
formly high  correlations  between  mean 
maneuver  grades  and  over-all  grades  on 
the  private  pilot  flight  test  were  those 
obtained  from  precision  landings.  These 
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Table  III 

Correlations  Between  Demerit  Scores 

and  Grades  on  Landings 

(Same  Instructor) 


Landing 
grades 


First  check 
Second  check 
Third  check 
Fourth  check 


.409 
.580 

.634 
.662 


correlations  are  given  below: 

Inspector  C  Inspector  D 
Overall:  Landing  .68  N  =  27  .65  N  =  28 
Overall:  Landing     .57  N  =  28      .73  N  =  28 

Although  two  of  the  factors  contained 
in  the  Purdue  Rating  Scale  showed  low 
positive  relationship  with  one  another, 
the  correlation  between  ratings  on  "Emo- 
tional Control"  and  "Skill'  was  excep- 
tionally high.  It  is  quite  possible  that  an 
emotionally  stable  individual  would  not 
be  tense  in  handling  the  controls  of  a 
plane,  and  it  is  well  known  that  much  of 
the  flight  instructor's  time  is  spent  in 
attempting  to  get  students  to  relax. 
Therefore,  it  is  probably  to  be  expected 
that  ratings  in  skill  and  emotional  con- 
trol would  be  positively  related. 

The  correlations  between  the  demerit 
scores  given  on  "Landings"  in  the  four 
check  flights  were  not  high,  indicating 
that  probably  the  factors  involved  were 
fairly  independent  of  one  another.  It  is 
probable  that  landing  performance 
actually  changes  as  the  student  has  more 
flight  instruction. 

The  correlations  between  demerit 
scores  on  "Strange  Field  Landings"  and 
other  ratings  were  low  with  the  exception 
of  those  between  demerit  scores  and 
grades  on  "Landings  with  a  Slip"  in  the 
fourth  check  flight.  Actually  this  affords 
an  indication  of  the  reliability  of  the 
ratings  as  "Strange  Field  Landings"  were 
administered  only  in  the  fourth  check 


flight.  Thus  the  same  individual  was  re- 
sponsible for  these  two  ratings.  The  high 
correlation  might  also  indicate  that  the 
same  type  of  abilities  are  involved  in 
making  landings  with  a  slip  and  strange 
field  landings. 

The  correlations  between  instructor's 
grades  on  landings  were  all  positive  but 
not  high.  The  highest  was  a  correlation 
of  .539  between  the  average  grades  on 
landings  preceding  the  second  and  third 
check  flights.  The  highest  correlation  be- 
tween instructor's  grades  and  any  of  the 
other  criteria  was  one  of  .593  obtained 
between  instructor's  grades  on  "Land- 
ings" preceding  the  second  check  flight 
and  "Time  to  Solo."  It  would  appear  that 
favorable  ratings  on  landing  performance 
are  related  to  the  time  when  the  instruc- 
tor gives  the  student  permission  to  solo. 

"Time  to  Solo"  showed  a  fairly  high 
correlation  with  the  C.A.A.  inspector's 
over-all  grade,  and  a  positive  relation- 
ship with  the  C.A.A.  landing  grade. 
"Time  to  Solo"  is  most  highly  correlated 
with  the  instructor's  rating  of  skill  on 
the  Purdue  scale.  This  again  suggests 
that  the  instructor  actually  does  allow  a 
student  whom  he  feels  is  skillful,  to  solo 
early  in  the  flight  course.  In  general,  this 
time  measure  showed  fairly  high  correla- 
tions with  the  other  criteria. 

The  intercorrelations  between  check 
pilot's  grades  on  landings  in  the  four 
check  flights  were  all  positive  with  the 
exception  of  ratings  on  the  fourth  check 
flight.  In  fact,  the  correlations  between 
grades  on  landings  in  the  fourth  check 
flight  show  a  progressively  inverse  rela- 
tionship to  giades  on  the  first,  second, 
and  third  check  flight. 

This  is  quite  different  from  what 
might  have  been  expected,  since,  it  seems 
more  logical  that  grades  on  landings 
should  show   more  relationship   to  one 
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another  as  the  time  of  flight  instruction 
increases.  Unreliability  of  the  rater 
might  be  offered  as  one  explanation,  al- 
though chance  might  also  be  responsible 
for  what  appears  to  be  a  progressively 
inverse  relationship.  It  is  possible  that 
variation  in  weather  conditions  con- 
tributed to  this  relationship. 

Check  pilots  and  instructors  have  been 
unanimous  in  stating  that  one  should  not 
expect  much  agieement  between  ratings 
on  one  check  flight  and  those  obtained 
from  any  other  because  of  the  vast  varia- 
tion in  the  air  conditions  under  which 
the  student  must  fly.  What  might  serve 
as  a  satisfactory  performance  on  one  day 
might  be  totally  inadequate  when 
weather  conditions  are  different. 

Table  IV,  show^s  the  correlations  be- 
tween the  predictors  and  each  of  the 
criteria.  A  cursory  examination  of  this 
table  indicates  that  once  again  none  of 
the  correlations  were  extremely  high. 
Some  criteria  showed  no  significant  cor- 
relations with  any  test  in  the  battery  so 
at  this  point  an  arbitrary  decision  was 
made.  This  decision  was  to  omit  from 
further  consideration  any  criterion  which 
did  not  have  a  correlation  of  at  least  .340 
with  any  test  in  the  predictor  battery. 
This  reduced  the  number  of  criteria  fi- 
nally selected  for  study  to  thirteen. 

In  Table  V,  are  the  correlations  ob- 
tained between  the  individual  tests  in  the 
predictor  battery  and  each  of  the  criteria 
selected  for  further  study. 

None  of  the  correlations  between  the 
tests  and  the  criteria  were  high.  The 
largest  number  of  comparatively  high 
correlations  were  obtained  between  tests 
in  the  predictor  battery  and  the  C.A.A. 
over-all  grade.  The  Biographical  Inven- 
tory showed  the  largest  number  of  com- 
paratively high  correlations  with  the 
various  criteria,  and  appeared  to  be  the 
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most  useful  predictor  as  far  as  data  from 
the  check  flights  were  concerned. 

The  intelligence  tests  were  not  high- 
ly correlated  with  the  criteria,  and 
the  Judgment-Reaction  Test  generally 
showed  low  positive  correlations.  Both 
the  Test  of  Aviation  Information  and 
the  Test  of  Mechanical  Comprehension 
were  correlated  most  highly  with  cri- 
terion data  obtained  from  the  private 
pilot  test,  although  the  correlations  with 
other  criteria  were  low.  The  Desire  to 
Fly  Inventory  was  likewise  correlated 
highly  with  the  C.A.A.  Inspector's  data, 
but  its  highest  correlation  was  with  the 
rating  of  Emotional  Control.  Of  the  two 
coordination  tests  included  in  the  bat- 
tery, the  Two-Hand  Coordination  Test 
appeared  to  have  the  most  significant 
relationship  to  the  criteria. 

Generally,  the  correlations  showed 
considerable  variation  and  little  predic- 
tive significance  can  be  attached  to  them. 
Although  the  criteria  used  differed  from 
those  in  the  present  research,  much  the 
same  type  of  results  were  found  in  the 
Boston-Midwest  Project  when  correla- 
tions were  computed  between  individual 
predictors  and  criteria  (20). 

Table  VI  shows  the  partial  regression 
coefficients  which  give  the  relative 
weighting  of  the  nine  factors  measured 
by  the  test  battery  in  predicting  the 
various  criteria.  The  line  second  from 
the  bottom  of  the  table  contains  the 
multiple  correlations,  which  were  ob- 
tained between  the  test  battery  and  each 
criterion.  The  bottom  row  contains  the 
corrected  multiple  correlation  co- 
efficients. 

In  general,  no  one  test  was  weighted 
consistently  more  heavily  than  any  of  the 
other  tests  in  predicting  the  criteria.  As 
far  as  criteria  obtained  from  C.A.A.  In- 
spectors were  concerned,  three  tests  had 
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the  heaviest  weighting.  These  were  The 
Test  of  Aviation  Information,  the  De- 
sire to  Fly  Inventory,  and  the  Two-Hand 
Coordination  Test.  Akhough  all  the  tests 
were  heavily  weighted  for  some  of  the 
criteria,  the  weighting  was  not  consistent. 
With  the  possible  exception  of  the  Mash- 
burn  and  the  Ohio  State  Psychological 
Examination,  the  test  battery  showed 
promise  of  value  in  the  prediction  of 
the  criteria  when  the  weights  were  prop- 
erly adjusted  for  each  test  in  the  bat- 
tery. 

The  Judgment-Reaction  Test  proved 
to  be  more  important  in  predicting  cri- 
teria based  on  performance  of  specific 
landing  maneuvers  than  on  the  over-all 
grade  given  by  the  C.A.A,  Inspector.  The 
implication  is  that  whatever  factors  are 
measured  by  the  test  are  involved  in 
landing  a  plane. 

Examination  of  the  multiple  correla- 
tions at  the  bottom  of  Table  VI  shows 
that  the  test  battery  predicted  perform- 
ance on  the  C.A.A.  private  pilot  exami- 
nation far  better  than  it  predicted  per- 
formance on  any  of  the  other  criteria. 
This  was  the  only  correlation  which  was 
significant  at  the  one  per  cent  level. 
However,  seven  of  the  correlations 
proved  to  be  significant  at  the  five  per 
cent  level. 

DISCUSSION   OF  RESULTS 

One  of  the  purposes  of  this  research 
was  to  examine  the  possibility  of  analyz- 
ing the  factors  involved  in  learning  to 
fly.  It  appears  from  an  examination  of 
these  results  that,  at  least  for  the  group 
of  students  studied,  and  using  the  cri- 
teria which  were  available  in  this  study, 
it  is  possible  to  discover  what  some  of 
these  factors  are.  To  discover  all  of  the 
factors  involved  Avould  be  a  task  far 
beyond  the  scope  of  the  present  investi- 


gation in  which  the  desire  was  only  to 
demonstrate  that  such  a  procedure  was 
feasible. 

In  general,  the  corrected  correlation 
coefficients  were  not  sufficiently  high  to 
warrant  specific  conclusions  regarding 
the  })redictive  value  of  this  battery  if 
used  with  another  sample. 

It  was  thought  that  perhaps  the  test 
battery  would  predict  more  successfully 
at  different  stages  in  the  student's  flight 
training.  In  other  words,  it  might  predict 
early  or  ultimate  success  but,  perhaps, 
not  both.  Its  most  successful  prediction 
was  for  the  over-all  rating  at  the  end  of 
the  course.  Other  than  this,  there  is  no 
basis  for  assuming  that  the  battery  was  of 
more  value  in  predicting  performance  at 
one  stage  of  flight  training  more  than  at 
any  other.  In  addition,  there  was  no  con- 
sistency in  the  weighting  of  the  various 
tests  at  different  time  intervals  through- 
out the  flight  course.  Furthermore,  it 
must  be  remembered  that  the  private 
pilot  test  was  the  only  measure  which 
included  ratings  on  more  than  one  type 
of  performance. 

There  was  no  consistent  relationship 
between  the  accuracy  of  prediction  and 
the  number  of  scores  involved  in  the 
rating.  For  example,  ratings  on  Emo- 
tional Control  were  made  only  once,  and 
a  multiple  correlation  of  .634  (significant 
at  the  1%  level)  was  obtained  between 
the  test  battery  and  this  criterion.  Rat- 
ings by  instructors  on  landing  grades 
were  based  on  averages  of  a  number  of 
different  landings,  yet  on  the  sample  of 
landings  immediately  preceding  the  third 
check  flight,  the  multiple  correlation  was 
only  .512,  a  correlation  that  might  have 
arisen  by  chance. 

The  fact  that  there  was  no  consistency 
in  the  weighting  of  items  for  the  various 
criteria  offers  several  possibilities.   First 
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of  all,  it  may  be  that  each  maneuver,  weather  conditions  could  be  included  in 

even  though  it  is  a  landing  maneuver,  the  rating  sheet. 

actually  is  a  different  performance.  This  It  is  impractical  to  state  that  there  is 
might  explain  differences  in  weighting  but  one  criterion  of  ability  to  pilot  light 
between  regular  landings  and  landings  aircraft.  For  a  number  of  years,  research 
with  a  slip.  However,  on  regular  land-  on  pilot  selection  was  based  upon  a  pass- 
ings, there  were  different  weightings  for  fail  criterion,  but  this  has  proved  to  be  of 
the  various  tests.  Furthermore,  landings  less  and  less  value  (25).  Rather  than  to 
in  different  check  flights  were  weighted  predict  success  or  failure  on  such  a  gross 
differently  in  spite  of  the  fact  that  in-  basis,  a  successful  battery  should  have  use 
structors  were  supposedly  rating  on  a  in  predicting  performance  on  specific  as- 
fixed  scale.  pects  of  flight  performance,  and  it  was  one 

Differences  in  weights  were  more  ap-  of  the  purposes  of  this  research  to  demon- 
parent  when  different  raters  were  con-  strate  that  certain  aspects  of  flight  train- 
cerned  than  when  the  same  rater  was  ing  might  be  predicted  by  a  test  battery, 
involved.  This  finding  is  in  accord  with  It  appears  that  the  abilities  required  for 
what  has  been  reported  previously  in  this  the  performance  of  different  maneuvers 
investigation,  namely,  that  there  was  ^^e  themselves  different.  Therefore,  one 
little  agreement  between  different  raters'  of  the  uses  for  this  test  battery  would  ap- 
ratings  of  similar  performances.  pear  to  be  in  demonstrating  students' 

Further  evidence  that  instructors  weaknesses  to  the  instructor  before  any 
found  it  difficult  to  adhere  to  a  fixed  scale  lessons  were  given.  Furthermore,  should 
in  grading  performance  is  given  by  the  ^  flight  course  be  designed  in  such  a  man- 
differences  in  weighting  of  items  in  the  "er  that  certain  specific  aspects  of  flight 
instructor's  landing  grades  taken  before  performance  were  to  be  stressed  (rather 
the  first  and  third  check  flight.  This  dis-  than  over-all  ability  to  fly),  a  battery 
crepancy  is  not  so  apparent  in  check  such  as  this  would  be  useful  in  deter- 
pilot's  ratings.  Such  observations  are  im-  mining  applicants'  fitness  for  the  course, 
portant  for  the  check  pilot  is  in  a  posi- 

tion  to  be  more  objective  in  his  grading  summary  and  conclusion 

than  is  the  instructor  who  must  ride  with  ^^    this   research,    a    battery   of   nine 

the  student  every  day.  psychological  tests  was  administered  to 

Itisquitepossible,  however,  that  much  thirty-seven  male  flight  students.  Six  of 
of  the  inconsistency  between  ratings  on  these  tests  were  paper  and  pencil  tests 
what  appear  to  be  similar  maneuvers,  ^^^  three  were  psychomotor  tests.  Rat- 
may  be  due  to  weather  variations.  For  ings  on  various  aspects  of  flight  per- 
example,  in  quiet  air  a  student  might  formance  during  a  course  in  primary 
receive  a  satisfactory  grade  on  landing  flight  training  were  available  for  the 
performance,  yet  if  he  were  to  repeat  the  sample  studied.  Intercorrelations  among 
same  performance  in  rough  air,  his  grade  ^11  of  the  variables  were  computed,  and 
would  be  lower.  The  landing  perform-  regression  weights  and  multiple  correla- 
ance  in  rough  air  would  involve  quite  tions  wc<-e  computed  for  thirteen  se- 
different  skills  than  the  landing  made  in  lected  criteria. 

still  air.  This  suggests  that  some  method  On   the  basis  of  the  results  obtained 

should  be  devised  whereby  variations  in  in  this  study,  it  may  be  stated  that: 
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1.  It  was  possible  to  assemble  a  battery 
of  tests  uhich  ^vollld  be  of  use  in  analyz- 
ing the  factors  involved  in  learning  to 
fly  light  aircraft.  Those  factors,  as  meas- 
ured by  tests,  were  found  to  be: 

Intelligence 

Aviation  Information 

Biographical  Information 

Mechanical  Comprehension 

Desire  to  Fly 

Two-hand  Coordination 

Ability  to  make  Serial  Reactions  (Eye- 
hand-foot) 

Ability  to  make  rapid  judgment 
reactions. 

2.  The  amount  of  information  about 
aviation  which  the  flight  student  has  at 
the  beginning  of  the  course  seems  to  be 
highly  related  to  the  grade  he  receives 
on  the  private  pilot  flight  test.  For  dif- 
ferent criteria,  however,  the  various  fac- 
tors measured  by  the  test  battery  must  be 
weighted  differently. 

3.  Although  this  test  battery  produced 
multiple  correlations  ranging  from  .512 
to  .707  with  thirteen  separate  criteria, 
the  results  indicate  that  the  test  battery 
was  most  successful  in  predicting  when 
the  criterion  used  was  an  over-all  rating 
of  flight  performance,  rather  than  a  rat- 
ing of  performance  on  a  specific 
maneuver. 

4.  Some  writers  (8)  have  expressed 
doubt  regarding  the  effectiveness  of 
psychomotor  tests  in  the  prediction  of 
flight  success.  Within  the  limits  of  this 
study,  it  is  possible  to  say  that  they  can 
be  used  for  this  purpose  in  combination 
with  paper  and  pencil  tests. 

5.  A  new  test  was  included  in  the  bat- 
tery used  in  this  study.  This  •  test  was 
constructed  on  the  hypothesis  that  it  was 
possible  to  construct  a  test  which  would 
measure  factors  related  to  a  specific  flight 


maneuver,  in  this  case,  landing.  In  all 
but  one  case,  this  test  contributed  to  the 
prediction  of  performance  on  this 
maneuver.  It  was  much  more  related  to 
the  performance  of  specific  maneuvers 
than  to  the  performance  of  a  series  of 
flight  maneuvers  combined  into  a  flight 
test.  It  is  possible  that  revised  methods 
of  scoring  this  test  might  produce  more 
fruitful  results. 

6.  In  all  cases,  the  use  of  a  battery  of 
tests  was  of  considerably  more  predictive 
value  than  any  one  of  the  tests  used 
alone. 

7.  It  was  found  that  trained  raters 
differ  from  one  another  in  the  factors 
which  they  stress  in  evaluating  perform- 
ance, and  in  any  future  study  attempting 
to  evaluate  performance  in  flying,  care 
should  be  taken  to  train  raters  even  more 
thoroughly  in  the  techniques  of  rating. 
Furthermore,  some  indication  of  the 
weather  conditions  under  which  each 
maneuver  was  performed  would  be  of 
value  in  evaluating  a  student's  perform- 
ance on  the  same  maneuver  under  dif- 
ferent weather  conditions. 

8.  Since  ratings  made  by  individuals 
not  having  constant  contact  with  flight 
students  seemed  to  be  somewhat  more 
consistent  than  ratings  made  by  flight 
instructors  who  were  in  constant  contact 
with  the  student,  it  seems  desirable  that 
relations  between  the  rater  and  his  sub- 
ject should  be  kept  on  as  objective  a 
basis  as  possible. 

9.  It  might  be  well  in  future  research 
to  use  over-all  ratings  obtained  from  sev- 
eral sources.  In  this  study,  only  one  over- 
all measure  was  available.  The  use  of 
measures  obtained  from  various  sources 
would  make  it  possible  to  investigate  the 
possibility  that  this  same  battery  would 
predict  as  successfully,  or  more  success- 
fully, for  other  over-all  ratings.  Further- 
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more,  it  would  seem  desirable  in  future 
work  to  use  more  varied  types  of  specific 
criteria  rather  than  to  study  only  landing 
performance.  The  criteria  should  be  as 
independent  and  objective  as  possible. 

10.  A  further  suggestion  for  future  re- 
search would  be  to  submit  all  of  the  tests 
included  in  the  battery  to  such  a  test 
selection  technique  as  the  Wherry- 
Doolittle,  in  order  to  ascertain  if  certain 


of  the  tests  which  were  found  to  be  un- 
important in  some  of  the  criteria,  might 
be  eliminated  from  the  battery. 

11.  It  is  suggested  that  a  battery  such 
as  this  might  be  useful  in  indicating 
students'  potential  weaknesses  to  the  in- 
structor so  that  these  students  might  be 
eliminated  before  training,  or,  if  pos- 
sible, instruction  might  be  directed 
toward  overcoming  these  weaknesses. 


+- 
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THIS  Study  was  undertaken  in  an  ef- 
fort to  improve  the  prediction  of 
success  in  learning  to  pilot  light  aircraft. 
It  featured  a  new  psychomotor  test 
which,  it  was  hoped,  would  contribute 
materially  to  the  solution  of  this  timely 
problem. 

statement  of  problem 

Procedures  in  this  investigation  were 
planned  to  test  the  hypothesis  that  the 
ability  to  perceive  and  react  differentially 
to  visual  cues,  randomly  sampling  the 
visual  field,  is  related  to  the  piloting  of 
light  aircraft.  This  hypothesis  was  based 
on  several  considerations:  i.  the  failure 
of  simple  reaction  time  as  a  predictor  of 
success  in  learning  to  fly;  2.  the  com- 
parative success  of  complicated  reaction 
time  as  a  predictor  of  flight  performance; 
3.  the  observations  of  the  writer,  and  of 
other  fliers  that,  (a)  correct  responding 
within  reasonable  time  limits  seemed 
more  important  in  piloting  than  mere 
quickness  of  reaction,  and  that  (b)  many 
of  the  responses  required  for  successful 
piloting  had  to  be  made  to  indirect  visual 
cues  perceived  as  an  inseparable  part  of 
the  total  configuration. 

THE    indirect    VISION    TEST 

The  picture  on  the  following  page  will 
be  helpful  in  understanding  the  Indirect 
Vision  Test  Situation.  The  subject  was 
seated  on  the  adjustable  stool  in  front  of 
the  beaver-board  screen.  Behind  the 
rectangular  opening  in  the  center  was  a 
tachistoscope  that  presented  a  series  of 
simple    discrimination    problems.    This 


constituted  the  foveal  unit.  Behind  the 
small  round  openings  arranged  in  con- 
centric circles  around  the  center  were 
hooded  incandescent  bulbs  which  could 
be  activated  in  random  order.  These  con- 
stituted the  parafoveal  unit  of  the 
apparatus. 

The  "Instructions  to  the  Subject" 
further  reveal  the  general  nature  of  the 
indirect  vision  test: 

This  is  a  test  of  your  ability  to  see  and 
react  to  flashes  of  light  at  different  points  in 
the  visual  field.  When  a  light  flashes  on  the 
screen  in  front  of  you,  you  are  to  place  the 
stick  in  the  corresponding  notch  in  the  small 
board  directly  in  front  of  you.  (Demonstrate) 

Use  the  right  hand  to  move  the  stick.  Hold 
it  in  any  manner  that  is  convenient  for  you, 
but  do  not  grasp  it  too  tightly.  Return  the 
stick  to  the  center  position  immediately  after 
each  response  so  that  you  will  be  ready  for 
the  next  one. 

To  help  you  keep  looking  straight  ahead 
throughout  the  test,  you  will  look  through 
the  small  lighted  opening  in  the  center  of 
the  screen.  Behind  this  opening  will  appear 
successive  rows  of  three  squares.  When  the 
X  appears  in  the  middle  square,  you  are  to 
push  the  button  on  the  table  to  your  left.  Use 
the  left  hand.  Keep  the  left  arm  resting  on 
the  table  and  a  finger  on  the  button  so  as  to 
be  ready  when  the  X  appears  in  the  middle 
square. 

You  can  ask  about  anything  that  you  don't 
understand. 

You  will  be  given  a  practice  run  of  about 
one  minute,  after  which  you  will  be  given 
the  opportunity  to  ask  about  anything  that 
it  not  clear  to  you. 

The  test  run  will  continue  for  about  ten 
minutes. 

The  sole  intended  function  of  the 
foveal  task  was  that  of  insuring  uniform 
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Indirect  Vision  Test  Front  View 


fixation  during  the  test  and  between  indi- 
viduals tested.  The  problem  was  reduced 
to  a  level  of  facility  that  gave,  with  very 
rare  exceptions,  perfect  performances.  No 
record  was  made  of  responses  to  the 
foveal  stimuli,  but  a  buzzer  signaled  that 
a  response  was  being  made. 

The  foveal  unit  of  the  apparatus  con- 
sisted of  a  weight-driven,  pendulum- 
controlled,  rotating-drum  tachistoscope. 
The  tachistoscope  was  placed  behind  the 
screen  and  the  successive  stimuli  were 
observable  to  the  subject  through  a  small 
rectangular  opening  (1"  x  2")  in  the  cen- 
ter of  the  screen.  The  drum  rotated  and 
gave  a  new  exposure  every  2  seconds. 
Illumination  of  the  foveal  stimulus  was 
provided  by  a  20  watt  incandescent  bulb, 
hooded,  and  mounted  two  inches  above 
and  three  inches  in  front  of  the  rotating 
drum. 


The  constancy  of  eye-position  was  con- 
trolled by  an  adjustable  stool  and  a  hori- 
zontally fixed  headrest  26  inches  from  the 
screen. 

The  stimuli  for  the  indirect  vision  dis- 
crimination task  were  flashes  of  light 
through  twenty-one  openings  in  the 
screen.  These  openings,  three  quarters  of 
an  inch  in  diameter,  were  distributed  in 
circles  around  the  center  of  the  screen  at 
such  distances  as  to  give  visual  angles  of 
47°'  53°'  and  48°.  Because  of  the  inter- 
ference of  the  response  selector  the  bot- 
tom column  of  lights  was  deleted,  leaving 
twenty-one  openings. 

The  light  flashes  were  produced  by 
successively  activating  one  or  another  of 
the  15  watt  bulbs  mounted  and  hooded 
behind  the  openings.  An  alternating  cur- 
rent of  10  volts  provided  the  energy  for 
the  flashes.  The  rate  and  duration  of  the 
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flashes  were  held  constant  by  a  high- 
fidelity  electric  motor,  operating  on 
alternating  current.  The  flashes  occurred 
every  2.2  seconds  and  the  bulbs  were 
activated  for  a  period  of  .9  of  a  second. 
The  order  in  which  the  lights  flashed 
was  controlled  by  an  electromagnetic  ro- 
tating circuit  selector,  the  pattern  having 
been  determined  by  random  selection. 

A  correct  response  consisted  in  placing 
the  stick,  within  1.1  seconds,  in  the  notch 
corresponding  to  the  direction  of  the 
light  flash.  Correct  responses  were  auto- 
matically recorded  on  an  eiectromagneti- 
cally  operated  pen-polygraph.  A  buzzer, 
placed  in  parallel  with  the  recording 
magnet,  signaled  the  subject  that  a  cor- 
rect response  had  been  made. 

The  test  ran  for  a  continuous  series  of 
12  cycles  of  the  stimulus  pattern.  Each 
cycle  flashed  25  lights  which  made  a  total 
of  300  discrete  stimulations.  Cycles  were 
recorded  by  an  electric  counter  and  were 
automatically  marked  on  the  polygraph 
tape. 

The  measure  taken  as  the  score  was 
the  total  number  of  errors  or  lights 
missed. 

The  screen  was  of  light  brown  beaver- 
board,  and  the  general  lighting  for  the 
test  room  was  provided  by  an  exposed 
25  watt  light  bulb,  suspended  ^1/2  ^^^^  ^o 
the  rear  and  5  feet  above  the  eye-position 
of  the  subject.  No  daylight  was  used. 

The  reliability  of  the  indirect  vision 
test  was  checked  by  the  split-half  and  by 
the  test-retest  methods.  Scores  on  odd 
numbered  cycles  correlated  with  scores 
on  even  numbered  cycles  .925,  (N  =  265). 
The  test-retest  coefficient  of  reliability 
was  .784,  based  on  the  scores  of  130  sub- 
jects who  took  the  test  a  second  time 
after  a  one-week  interval,  A  likely  inter- 
pretation of  the  discrepancy  between  the 
coefficients  is  that  the  difference  is  large- 


ly due  to  personal  factors  which  vary 
from  one  week  to  the  next.  The  split- 
half  technique  would  be  expected  to 
render  a  close  approximation  of  the  true 
internal  consistency  of  the  test,  by  more 
closely  controlling  the  subjective  factors. 
Nevertheless,  it  is  the  lower  reliability 
measure  that  operates  in  the  determining 
of  validity. 

The  wide  spread  distribution  of  scores 
on  this  test  suggests  that  the  individuals 
tested  differed  markedly  in  whatever 
ability  or  combination  of  abilities  was 
being  used. 

OTHER  TESTS  USED 

The  design  of  the  experiment  was  ex- 
tended to  permit  observation  of  the 
operation  of  the  Indirect  Vision  Variable 
when  combined  with  a  selected  group  of 
pre-flight  tests.  The  tests  selected  for  study 
in  connection  with  the  Indirect  Vision 
Test  were: 

1.  The  Self- Administering  Test  of  Mental 

Ability    (Gamma  A.  M.  Otis  Quick- 
scoring) 

2.  Test     of     Mechanical     Comprehension 

(Form  B,  C.A.A.) 

3.  Desire  to  Fly    (Form  XPA) 

4.  Test  of  Aviation  Information   (Form  P) 

5.  Two-Hand  Coordination  Test 

These  tests  are  a  sampling  of  the  better 
paper-pencil  and  psychomotor  predictors 
now  available.  The  study  by  Lane  shows 
that  all  of  the  old  tests  in  this  battery 
have  some  relationship  to  various  meas- 
ures of  pilot  success.  (13) 

THE   CRITERIA 

Our  chief  interest  in  this  study  was  the 
student's  ability  to  fly  at  or  near  the  end 
of  the  normal  training  period.  Conse- 
quently most  of  the  criteria  selected  for 
use  were  measures  and  ratings  made  dur- 
ing this  later  period  of  training. 

The  only  specific  maneuver  separately 
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Table  I 

Intercorrelations  of  Predictors  (N  =  88) 


Indirect  Vision  Test 
Mental  Ability  Test 
Mechanical  Comprehension 
Desire  To  Fly  Inventory 
Aviation  Information  Test 
Two-Hand  Coordination  Test 


Indirect        Mental     Mechanical     Desire        Aviation    Two-Hand 

Vision  Ability       Compre-        to  Fly        Informa-    Coordina- 

Test  Test  hension      Inventory    tion  Test     ticn  Test 


•039 


103 

.  264 

.025 

-.058 

393 

.129 

.461 

•450 

.  169 

•395 

.498 

.068 

.083 
.199 

used  for  validation  was  that  of  landing 
the  plane.  Landings  were  selected  for  two 
reasons:  (1)  landing  is  universally  con- 
sidered one  of  the  crucial  aspects  of  learn- 
ing to  fly,  and  (2)  landing  would  seem  to 
require  the  ability  to  make  discrimina- 
tory responses  to  indirect  visual  cues  per- 
ceived in  relation  to  the  total  visual 
pattern. 

The  criteria  included: 


C.A.A.  Flight  Inspectors'  Overall  Grades 
C.A.A.  Flight  Inspectors'  Demerit  Scores 

on  Landings 
Check  Pilots'  Overall  Grades  on  Check 

Flights  1,  2,  3,  and  4.  (O.S.F.I.) 
An  Average  of  the  Check  Pilots'  grades 

on  Landings,  4th  Check  Flight. 
Instructors'  Ratings  of  Skill,  Judgment, 


and  Emotional  Stability  (Using  the 
Purdue  "Scale  for  Rating  Pilot 
Competency"). 

A  detailed  discussion  of  these  criteria  is 
included  in  the  study  by  Lane  reported 
in  the  first  part  of  this  monograph. 

RESULTS   AND   DISCUSSION    OF    RESULTS 

The  results  of  this  experiment  are 
summarized  in  the  four  tables.  The  cor- 
relation coefficients  were  computed  by 
the  product-moment  method  from  raw 
scores.  Table  I  presents  the  intercorrela- 
tions of  the  predictors.  Inter-correlations 
among  the  criteria  are  presented  in 
Table  IL  Table  III  gives  the  validity 
coefficients  for  the  predictors  when  used 


Table  II 
Intercorrelation  of  Criteria  (N  =  88) 


C.A.A.       C.A.A.      O.S.F.I.     O.S.F.I.     O.S.F.I.     O.S.F.I.     O.S.F.I.      Purdue      Purdue      Purdue 
Overall     Landing     Overall      Overall      Overall      Overall     Landings      Scale  Scale  Scale 

Grade         Score      Check  fi   Check  j! 2   Check  #3   Check  #4   Check  #4       Skill      Judgment  Emotion 


C.A.A.  Overall 

Grade 
C.A.A.  Landing 

Score 
O.S.F.I.  Overall 

Check  ti 
O.S.F.I.  Overall 

Check  f2 
O.S.F.I.  Overall 

Check  *3 
O.S.F.I.  Overall 

Check  U 
O.S.F.I.  Landings 

Check  /4 
Purdue  Scale  Skill 
Purdue  Scale 

Judgment 
Purdue  Scale 

Emotion 


.583 


227 

.119 

.229 

.507 

.298 

.446 

.i6s 

.270 

170 

.128 

.220 

.291 

.162 

.  206 

.I4S 

■MS 

•  137 

.422 

.  210 

■133 

•  470 

.129 

•431 

-.075 

.382 

.420 

.38s 

.08S 

.190 

.203 

-.084 

.366 

.064 

.38s 

.756 

■  542 
.428 

.241 

.243 
.407 

.474 

.308 
.692 

•  354 

22 


RONALD    R.    GREENE 


Table  III 
Correlations  Between  Individual  Predictors  and  Criteria  (N  =  88) 


C.A.A. 

C.A..^. 

O.S.F.I. 

O.S.F.I. 

O.S.F.I. 

O.S.F.I. 

O.S.F.I. 

Purdue 

Purdue 

Purdue 

Overall 

Landing 

Overall 

Overall 

Overall 

Overall 

landing 

Scale 

Scale 

Scale 

Grade 

Score 

Check  #1 

Check  #2 

Check  #3 

Check  #4  Check  #4 

Skill 

Judgment 

Emotion 

Indirect  Vision 

Test 

-.o8g 

.060 

.068 

.136 

-.023 

.072 

.24S* 

.261* 

•  173 

.176 

Ment.il  Ability 

Test 

-.157 

— .  109 

—  .050 

.249* 

.046 

.038 

.109 

.  227* 

.312** 

.188 

Mechanical  Com- 

prehension 

.279** 

-.065 

.280** 

.287** 

.099 

.300** 

.361*+ 

.376** 

.203 

.340** 

Desire  to  Fly 

Inventory 

—  .012 

-.115 

.05s 

.083 

.030 

-.059 

.010 

.213* 

.186 

.153 

.Aviation  Informa- 

tion Test 

•  195 

.054 

.  206 

.338** 

.079 

.301** 

.302** 

. 283** 

.027 

.  260* 

Two-Hand  Coor- 

dination Test 

.006 

-.129 

.061 

.138 

—  .009 

.  160 

.210* 

.  206 

. 271** 

.298** 

*  Significant  at  the  $  %  level. 
**  Significant  at  the  i  %  level. 


independently.  Table  IV  shows  the  beta 
weights  for  the  various  tests  in  relation 
to  each  of  the  ten  criteria.  The  last  row 
of  figures  in  Table  IV  shows  the  multiple 
correlations  between  the  test  battery  and 
each  criterion. 

The  Indirect  Vision  Test  shows  a 
marked  degree  of  uniqueness.  In  only 
one  instance  is  the  inter-correlation 
above  .103.  The  .103  is  with  Mechanical 
Comprehension.  The  higher  one,  .264,  is 
with  the  Desire-to-Fly  Inventory.  Perhaps 
those  who  had  the  greatest  desire  to  fly 
were  the  most  highly  motivated  on  the 
Indirect  Vision  Test. 

Only  one  other  test,  the  Desire-to-Fly 
Inventory,  shows  any  considerable  de- 
gree of  independence.  The  highest  inter- 
correlation  for  this  test  is  the  one  with 


the  Indirect  Vision  Test  indicated  above. 

The  four  remaining  tests  of  the 
battery,  namely  the  Mental  Ability  Test, 
Mechanical  Comprehension,  Aviation 
Information,  and  the  Two-Hand  Co- 
ordination Test,  show  a  considerable 
commonality.  The  inter-correlations  for 
these  tests  range  from  .393  to  .498.  A 
factorial  analysis  is  needed  to  appraise 
the  common  elements  in  the  hope  of  re- 
ducing the  number  of  tests  or  test  items 
needed. 

A  brief  visual  survey  of  Table  II  re- 
veals that  the  inter-correlations  tend  to 
run  a  little  higher  for  the  criteria  than 
they  were  in  the  case  of  the  predictors. 
This  was  to  be  expected  on  several 
counts.  In  the  first  place  four  of  the 
criteria,  the  O.S.F.I.  check  flights,  were 


Table  IV 
Beta  Weights  and  Multiple  Correlations  (N  =  88) 


C.A.A. 

C.A.A. 

O.S.F.I. 

O.S.F.I. 

O.S.F.I. 

O.S.F..I 

O.S.F.I. 

Purdue 

Purdue 

Purdue 

Overall 

Landing 

Overall 

Overall 

Overall 

Overall 

Landings 

Scale 

Scale 

Scale  < 

Grade 

Score 

Check  #1 

Check  #2 

Check  #3 

Check  #4  Check  #4 

Skill 

Judgment 

Emotion 

Indirect  Vision 

Test 

—  .122 

.091 

.035 

.  109 

-.049 

.087 

.258 

.197 

.143 

■  153 

Mental  Ability 

Test 

—  .380 

-.117 

—  .264 

.080 

.017 

—  .224 

-.161 

.015 

.2S0 

—  .069 

Mechanical  Com- 

prehension 

.382 

-.025 

.298 

•  159 

.120 

.230 

•  251 

.246 

.049 

.163 

Desire  to  Fly 

Inventory 

—  .006 

—  .123 

.025 

.002 

.025 

—  .121 

-.108 

.103 

.102 

.064 

Aviation  Informa- 

tion Test 

.236 

•  139 

.  209 

•  239 

.049 

.299 

.252 

.157 

-.163 

.176 

Two-Hand  Coor- 

dination Test 

—  .007 

—  .076 

—  .on 

—  .019 

—  .091 

.102 

.131 

.048 

•  153 

.  216 

R 

.472** 

.227 

.378* 

.387* 

.138 

.42  5** 

.489** 

.471*" 

'        .422** 

.436** 

*  Significant  at  the  s%  level. 

**  Significant  at  the  i  %  level. 
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repetitive,  and  involved  two  successive 
ratings  by  each  of  the  check  pilots.  More- 
over, no  serious  attempt  was  made  to 
eliminate  overlapping  in  the  selection  of 
criteria.  The  tenuous  nature  of  the 
criteria  did  not  warrant  selection  for 
mutual  exclusion.  In  one  case  the  rating 
of  a  single  performance  is  treated  both  as 
a  separate  criterion  and  as  a  factor  in- 
cluded in  another.  This  undoubtedly 
goes  a  long  way  toward  explaining  the 
comparatively  high  correlation  (.758)  be- 
tween landings  of  the  fourth  check  flight 
and  the  over-all  score  for  the  entire  flight. 
The  fourth  check  flight  was  the  only  one 
for  which  a  separate  landing  grade  was 
used. 

It  is  interesting  and  perhaps  significant 
that  the  correlations  between  check 
flights  involving  two  ratings  by  the  same 
check  pilot  for  each  of  the  subjects 
averages  .402,  while  those  between  check 
flights  in  which  each  student  received 
one  rating  from  each  check  pilot  averages 
.170.  It  is  unlikely  that  differences  in  per- 
formances would  account  for  more  than  a 
fraction  of  this  discrepancy.  The  lower 
average  (.170)  suggests  unreliability  in 
these  ratings,  and  leads  one  to  wonder 
whether  the  higher  self-consistency  (.402) 
might  not  be  due  to  constant  errors  on 
the  part  of  the  raters.  Thus,  not  only  the 
reliability  of  the  ratings,  but  the  validity 
as  well,  is  open  to  question. 

The  second  highest  intercorrelation 
(.692)  is  found  between  the  skill  and  emo- 
tion aspects  of  the  Purdue  Scale.  Judg- 
ment and  emotion  correlate  .354,  while 
0  skill  and  judgment  show  a  relationship 

of  .407.  The  halo  effect  may  be  operating 
here,  since  all  three  aspects  are  rated  by 
the  same  instructor.  A  further  investiga- 
tion of  the  nature  of  these  relationships 
would  be  in  order. 

Apparently  one  of  the  greatest  needs  in 


aviation  psychology  is  for  more  adequate 
criteria  of  pilot  skill.  The  photographic 
method  may  contribute  to  the  solution, 
but  even  this  seemingly  objective  method 
is  not  w^ithout  its  problems.  Evidence  so 
far  suggests  that  agreement  between  rat- 
ings and  photographic  records  is  about 
the  same  as  between  raters.  As  yet  there 
is  no  clear  way  of  establishing  either  of 
these  methods  as  the  one  by  which  the 
other  can  be  evaluated.  A  factorial 
analysis  of  the  data  now  available  is  a 
very  necessary  next  step. 

It  is  obvious  from  the  data  in  Table 
III  that  no  single  test  of  the  battery  is 
adequate  for  the  prediction  of  success  in 
learning  to  pilot  light  aircraft.  Those 
correlations  which  differ  significantly 
from  zero  are  marked  with  one  or  two 
asterisks.  For  those  marked  with  two 
asterisks  the  chances  are  less  than  one  in 
a  hundred  that  such  a  correlation  would 
be  drawn  from  a  universe  where  the  true 
value  is  zero.  One  asterisk  indicates  that 
the  odds  are  between  i  in  20  and  1  in 
100  that  the  true  value  of  the  correlation 
would  be  zero. 

On  the  whole  the  correlations  are  low. 
They  are,  on  the  average,  a  little  lower 
than  those  reported  by  Lane  for  the 
same  tests.  (13)  One  thing  that  may  help 
account  for  this  lowering  of  relationship 
is  the  fact  that  the  scores  for  this  study 
were  taken  from  three  quarters  of  train- 
ing, rather  than  the  two  used  by  Lane. 
Changing  weather  conditions  and  chance 
factors  have  had  a  greater  opportunity  to 
influence  the  outcomes. 

While  the  Indirect  Vision  Test  did  not 
produce  according  to  expectation,  it  was 
shown  to  have  a  fairly  consistent  relation- 
ship with  the  later  check  pilot  grades 
and  the  instructor  ratings.  The  highest 
relationship  (.261)  is  with  skill,  as  esti- 
mated by  the  student's  regular  instructor. 
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and  the  second  highest  (.248)  is  with  skill 
in  landing,  as  graded  by  the  check  pilot 
on  the  fourth  check  flight.  Both  of  these 
correlations  are  significant  at  the  5% 
level 

The  Mechanical  Comprehension  Test 
leads  the  field  in  predictive  value,  with 
seven  out  of  the  ten  validity  coefficients 
significant  at  the  1%  level,  all  being 
above  .27.  The  second-best  individual 
predictor  is  the  Aviation  Information 
Test.  It  has  correlations  significant  at 
the  1%  level  with  O. S.F.I,  check  flights 
two  and  four,  skill  on  the  Purdue  Scale, 
and  with  the  landing  score  on  the  fourth 
check  flight. 

The  Two-Hand  Coordination  Test  is 
of  little  predictive  value,  except  in  re- 
lation to  the  landings  on  the  fourth  check 
flight  and  the  three  ratings  on  the  Purdue 
Scale.  The  Mental  Ability  Test  and  the 
Desire-to-Fly  Inventory,  according  to  this 
study,  are  of  little  or  no  value  in  pre- 
dicting pilot  success. 

It  is  interesting  to  note  that  the 
Mechanical  Comprehension  Test  is  the 
only  one  that  bears  a  significant  relation- 
ship with  the  C.A.A.  examination 
criteria.  Perhaps  the  validity  of  these 
examinations  should  be  seriously  investi- 
gated, since  the  granting  or  withholding 
of  the  private  pilot  license  depends  upon 
them. 

The  beta  weights  for  the  Indirect 
Vision  Test  reveal  that  although  the 
original  validity  coefficients  were  low 
they  are  used  almost  in  toto  in  the  final 
predictive  value  of  the  test  battery.  The 
greatest  contribution  of  the  Indirect 
Vision  Test  is  in  the  prediction  of  land- 
ing ability  as  measured  by  the  O. S.F.I, 
technique  in  the  fourth  check  flight.  It 
may  well  be  that  the  future  usefulness 
of  this  test  will  be  highest  in  relation  to 
this  particular  maneuver,  involving  as  it 


does  a  rapidly  changing  visual  pattern. 

The  Mechanical  Comprehension  Test 
will  be  seen  to  make  the  highest  average 
contribution.  Half  of  the  beta  weights  for 
this  test  are  above  .20.  It  is  especially 
strong  in  relation  to  the  C.A.A.  overall 
grade  (.382). 

The  three  tests  accounting  for  the 
greater  part  of  the  predictive  value  of  the 
battery  are  the  Mental  Ability  Test,  the 
Test  of  Mechanical  Comprehension,  and 
the  Aviation  Information  Test.  The  pre- 
dictive value  of  a  battery  comprised  of 
these  three  tests  alone  would  approach 
that  of  the  six-test  battery  for  most  of 
the  criteria.  In  any  case  it  is  recom- 
mended that  in  any  future  use  of  these 
data,  tests  showing  beta  weights  of  less 
than  .10  be  dropped  from  the  computa- 
tion of  R  for  any  particular  criterion. 

A  survey  of  the  multiple  correlations 
as  presented  in  Table  IV  shows  that  the 
test  battery  bears  no  significant  relation- 
ship to  C.A.A.  landing  scores  or  to  the 
over-all  scores  on  the  third  check  flight. 
For  performance  during  the  first  and 
second  check  flights  the  battery  of  tests 
would  seem  to  have  only  slight  predictive 
value.  The  R's  for  these  check  flights  are 
.378  and  .387.  The  multiple  correlations 
for  the  remaining  criteria  range  from 
.422  (Judgment  on  the  Purdue  Scale)  to 
.489  (landings  on  the  fourth  check  flight), 
all  being  significant  at  the  1%  level. 

RECOMMENDATIONS 

The  Indirect  Vision  test  can  be  im- 
proved in  several  ways.  In  this  experi- 
ment the  visual  cues  were  presented  at  a 
constant  rate.  Breaking  up  this  rhythm 
might  make  the  test  more  differentiating 
and  possibly  increase  its  validity.  Again, 
by  modifying  the  intrument  to  permit 
shifting  of  the  positions  of  the  signal 
lights,  a  more  adequate  sampling  of  the 
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total  visual   field   could   be   obtained.  for   studies    of   learning   in   relation    to 

The  test  miglit  be  greatly  improved  as  distractions.  The  insertion  of  a  rheostatic 
a  predictor  of  pilot  success  by  using,  in-  control  in  the  stimulus  circuit  would  per- 
stead  of  discrete  signals,  a  continuously  mit  the  study  of  intensity  thresholds  in 
changing  pattern  of  stimuli  more  analo-  the  different  parts  of  the  visual  field, 
gous  to  that  encountered  in  actual  flight.  Colored  stimuli  could  also  be  provided 
A  moving  picture  of  what  the  pilot  sees  with  a  minimum  of  difficulty, 
in  flying  and  landing  could  be  projected  It  might  be  profitable  to  discover  the 
upon  a  translucent  screen  from  the  rear,  value  or  lack  of  value  of  the  improved 
Cues  for  discriminatory  responding  Indirect  Vision  Test  in  predicting  suc- 
would  need  to  be  selected  and  the  subject  cess  in  automobile  driving  and  in  select- 
instructed  accordingly.  ing   men    for    industrial    positions    that 

It  is  recommended  that  the  improved  seem  to  involve  continuous  attending  to 

test  be  correlated  with  some  criterion  of  wide  visual  areas. 

success    in    formation    flying.    With    the  The  prime  need  in  the  whole  program 

rapid  growth  in  aviation,  formation  fly-  of  aviation  psychology  is  for  improved 

ing  of  some  kind  in  and  around  airports  measures  of  success  in  piloting.  It  is  much 

is  becoming  a  civilian,  as  well  as  a  mill-  easier  to  point  out  the  need  for  improved 

tary  necessity.  criteria,  however,  than  it  is  to  propose 

The  instrument  devised  for  this  study  ways  and  means  of  bringing  about  this 

has    possibilities    as    a    training   device,  desired  improvement.  Already  the  prob- 

especially    if   the    moving    picture    tech-  lem  has  received  a  great  deal  of  expert 

nique  were  to  be  added.  The  stimulus  attention.  It  would  seem  that  a  sufficient 

control  and  response  selector  parts  of  the  mass    of   data    is    available    on    a    great 

apparatus  lend  themselves  to  progressive  variety  of  criteria  as  to  warrant  a  fac- 

complication,  so  that  a  series  of  tasks  at  torial  analysis  as  a  logical  next  step.  Such 

different  levels  of  difficulty  could  be  pre-  an  analysis  would  enable  us  to  spot  the 

sented.  commonalities    among    present    criteria 

A  time  saving  might  be  effected  by  de-  and  to  concentrate  our  further  efforts  in 
termining  the  smallest  necessary  sam-  those  areas  contributing  most  to  flying 
pling  on  the  test.  Separate  correlations  success  as  it  has  been  measured.  The  re- 
with  criteria  for  each  of  the  cycles,  might  suits  of  this  analysis  would  also  be  help- 
lead  to  improved  validity  as  well  as  a  ful  as  a  guide  to  test  making  and  test 
saving  in  time.  On  the  other  hand,  per-  improvement. 

haps  the  test  should  be  extended  to  dis-  The  introduction  of  polygraphic  and 

cover  what  effect  this  might  have  on  reli-  photographic  techniques  would  seem  to 

ability  and  validity.  be  in  the  direction  of  improved  accuracy 

The  apparatus  can  be  readily  modified  and  objectivity.  The  little  evidence  that 

to  measure  simple  reaction  time,  either  we  have  regarding  these  techniques  is  not 

on    the   polygraph    or   with   an   electric  too  promising.  They  have  the  advantage 

chronometer.  By  eliminating  the  indirect  of  rendering  a  permanent  record  of  what 

stimuli  and  using  only  the  central  unit,  took  place,  records  that  can  be  reviewed 

simple  problems  in  learning  and  remem-  again  and  again.  But  the  interpretation 

bering    can     be    investigated.    Without  of  meaning  of  the  records  is  still  proble- 

modification  the  apparatus  could  be  used  matical.  A  control  record  made   by  an 
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expert  pilot  in  the  same  plane,  flying  the  summary 
same  course,  at  nearly  the  same  time,  is  The  experiment  described  in  detail  in 
probably  the  best  basis  for  comparison,  the  preceding  chapters  was  addressed  to 
Even  this  technicjue  does  not  take  into  the  general  problem  of  improving  pre- 
account  the  capricious  nature  of  minor  diction  of  success  in  piloting  light  air- 
air  currents.  craft. 

The  rating  technique  is  still  far  from  The  specific  problem  was  that  of  de- 
being  outmoded.  While  it  is  true  that  termining  the  relation  between  the 
ratings  generally  show  an  unsatisfactory  ability  to  perceive  and  react  differentially 
reliability,  there  are  ways  in  which  the  to  configurational  changes  and  the  ability 
rating  of  pilot  skill  can  be  improved.  The  to  pilot  light  aircraft.  A  test  was  devised 
training  of  raters  is  one  method  that  has  to  measure  the  ability  to  perceive  the 
not  been  fully  utilized.  Training  can  re-  total  visual  field  and  to  make  selective 
suit  in  a  clearer  understanding  of  the  responses  to  changes  therein.  The 
specific  behavior  being  rated,  and  in  a  changes  were  provided  by  flashes  of  light 
more  uniform  interpretation  of  the  sampling  the  visual  field.  The  intensity 
meaning  of  the  positions  along  the  scale,  and  duration  of  these  stimuli  were  below 
Multiple  ratings  on  the  same  flight  per-  the  threshold  for  after-images.  Correct 
formance  have  been  previously  impossi-  directional  responses  were  recorded  elec- 
ble,  inasmuch  as  practically  all  of  the  tromagnetically  on  a  pen-polygraph.  A 
basic  training  planes  have  been  two-  simple  foveal  task  was  employed  to  con- 
seaters.  With  the  rapid  developments  in  trol  eye-fixation.  Indirect  visual  cues  were 
light  plane  designing,  it  would  be  well  to  presented  at  the  rate  of  one  every  2.2 
consider  the  possibility  of  having  two  or  seconds.  To  be  recorded  the  correct  re- 
more  raters  fly  simultaneously  with  the  sponse  had  to  be  made  within  1.1  seconds 
subject  and  make  individual  evaluations  after  the  stimulus  occurred.  Reliability 
of  the  various  maneuvers.  coefficients  for  this  test  were  satisfactory 

With    the    large    number    of    criteria  for  its  use. 

sampling  the  student's  skill  at  successive  Scores  on  this  test  were  correlated  with 

points    in    the    training   period,    one    is  ten  criteria  of  success  in  flying.  The  r's 

prompted    to   raise    the    question    as   to  are  low,  but  compare  favorably  with  the 

whether  or  not  some  of  the  early  criteria  predictive  value  of  other  pre-flight  tests, 

might  have  predictive  value  for  later  per-  Inter-correlations   between   the   Indirect 

formance.   The  intercorrelations   of   the  Vision  Test  and  five  more-or-less  widely 

criteria  in  this  study  are  a  step  in  the  used  pre-flight  tests  reveal  a  minimum  of 

direction   of  finding  an  answer   to   this  over-lapping.  The  beta  weights  assigned 

question.  It  is  recommended  that  the  re-  to  the  Indirect  Vision  Test,  when  all  six 

suits  of  this  experiment,  along  with  the  predictors  are  used  as  a  battery,  further 

data  from  Lane's  study,  be  used  in  de-  substantiate  its  unique  contribution, 

termining  the  value  of  measures  of  early  The  R's  between  the  battery  of  tests 

flight  performance  in  the  prediction  of  and  the  various  criteria  are  comparable 

ultimate  success  or  failure  in  learning  to  to  those  of  similar  batteries.   They  are 

pilot  light  aircraft.  sufficiently  high  and  significant  in  rela- 
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tion  to  eight  out  of  the  ten  criteria  as  to 
warrant  use  of  the  battery  in  practical 
situations,  pending  the  development  of 
better  predictors. 

The  correct  evaluation  of  predictors  is 
still  problematic,  due  to  weaknesses  in 
the  criteria  of  success  in  flying.  It  is  possi- 
ble that  the  Indirect  Vision  Test,  and  the 
battery  of  which  it  is  a  part  will  show 
higher  validity  when  related  to  more  ade- 
cjuate  criteria.  On  the  other  hand,  the 
\alidity  coefficients  may  be  lowered  by 
the  improvement  of  criteria. 

Insofar  as  the  criteria  used  in  this  study 


are  true  measures  of  pilot  ability,  per- 
formances on  the  Indirect  Vision  Test 
and  on  the  test  battery  have  something  in 
conmion  with  the  successful  piloting  of 
light  aircraft. 

Suggestions  have  been  made  for  im- 
proving the  Indirect  Vision  Test  and  for 
investigating  its  usefulness  in  fields  other 
than  aviation.  It  has  also  been  pointed 
out  that  the  residts  of  this  experiment 
provide  a  basis  for  the  investigation  of 
additional  problems  in  aviation  psy- 
chology. 
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